KNN and Re-ranking Models for English Patent Mining at NTCIR-7
نویسندگان
چکیده
This paper describes our English patent mining system for NTCIR-7 patent mining task which maps a research paper abstract into IPC taxonomy. Our system is basically under the k-Nearest Neighboring framework, in which various similarity calculation and ranking methods are used. We employ two re-ranking techniques to improve the performance by the use of richer features. Our systems performed well on the NTCIR-7 patent mining task (English sub-task) and obtained the best MAP-measure among all the participations.
منابع مشابه
ICL at NTCIR-7: An Improved KNN Algorithm for Text Categorization
This paper describes our system for the NTCIR-7 Patent Mining Task which sought to make automatic text classification pragmatic. Our system employs an improved KNN algorithm which makes trade-off between effectiveness and time complexity. We have tried two distance metrics in our algorithm: cosine similarity and Euclid distance. Evaluation results on NTCIR-7 test data shows that the former one ...
متن کاملOn the Robustness of Document Re-Ranking Techniques: A Comparison of Label Propagation, KNN, and Relevance Feedback
This paper describes our work at the sixth NTCIR workshop on the subtask of C-C single language information retrieval. We compared label propagation (LP), K-nearest neighboring (KNN), and relevance feedback (RF) for document re-ranking and found that RF is a more robust technique for performance improvement, while LP and KNN are sensitive to the choice and the number of relevant documents for s...
متن کاملExperiments on Cross-language and Patent Retrieval at NTCIR-3 Workshop
The Berkeley group participated in the crosslanguage retrieval task and the patent retrieval task at the third NTCIR workshop. This paper describes our experiments on cross-language and patent retrieval. We present an automatic relevance feedback procedure for document ranking formula based on logistic regression, and a procedure for automatically extracting Chinese/Japanese translations of Eng...
متن کاملUsing the Multi-level Classification Method in the Patent Mining Task at NTCIR-7
A patent includes a great deal of practical technical information, and plays an important role in promoting scientific development. The research on patent classification and retrieval has significant application value. A patent is a special technical text with strict hierarchical classification system and normalized structure, and there are a number of relations between patents and their consti...
متن کاملOverview of the Patent Mining Task at the NTCIR-7 Workshop
This paper introduces the Patent Mining Task of the Seventh TCIR Workshop and the test collections produced in this task. The task’s goal was the classification of research papers written in either Japanese or English in terms of the International Patent Classification (IPC) system, which is a global standard. For this task, 12 participant groups submitted 49 runs. In this paper, we also report...
متن کامل